1. Technical Field
The invention disclosed broadly relates to data processing technology and more particularly relates to improvements in information retrieval.
2. Background Art
For the last two decades the retrieval of documents using a computer has been a prominent application in both business and library science. Two methods of preparing and retrieving documents have become established in the state of the art. They are:
Manual Generation of Key Word: At the time of document archival, operator intervention is required to manually attach to the document a set of terms that, in the opinion of the operator, describe the content or theme of the document being stored. The words or phrases may or may not occur within the document and represent a subjective judgement by the operator as to how the document may be queried in the future.
Contextual: Prior to document archival, each word in the document text is reviewed and based on a criterion or set of criteria, words and phrases are chosen as being retrieval terms for the subject document. In its simplest form, each word in the document text can be viewed as a retrieval term. Alternately, elaborate grammatical criteria can be used to scale down the selection of key words from the document text to more specific words which, based on linguistic and information science methodology, are determined to have a greater level of specificity and to be of more use in later retrieval.
An example of the manually generated key word retrieval system is the IBM PROFS System and an example of a contextual system for document text is the IBM STAIRS program product. Both of these are examples of host computer based information retrieval systems. An example of a contextual information retrieval system for document text, which operates on a personal computer or a local area network is the IBM Search Vision product.
The prior art has not provided an efficient means for archiving documents having mixed object types of both text and non-text objects. In the prior art, if an archivist were attempting to archive a document which included images or graphics, the archivist would manually add descriptive terms as key words from his own judgement as to words which most appropriately describe the image or graphic. The prior art has failed to provide a contextual approach to archiving documents having non-text objects. Furthermore, non-text objects contained within a document are not independently accessible in prior art information retrieval systems.